parameter control
Learning Adaptive Evolutionary Computation for Solving Multi-Objective Optimization Problems
Coppens, Remco, Reijnen, Robbert, Zhang, Yingqian, Bliek, Laurens, Steenhuisen, Berend
Multi-objective evolutionary algorithms (MOEAs) are widely used to solve multi-objective optimization problems. The algorithms rely on setting appropriate parameters to find good solutions. However, this parameter tuning could be very computationally expensive in solving non-trial (combinatorial) optimization problems. This paper proposes a framework that integrates MOEAs with adaptive parameter control using Deep Reinforcement Learning (DRL). The DRL policy is trained to adaptively set the values that dictate the intensity and probability of mutation for solutions during optimization. We test the proposed approach with a simple benchmark problem and a real-world, complex warehouse design and control problem. The experimental results demonstrate the advantages of our method in terms of solution quality and computation time to reach good solutions. In addition, we show the learned policy is transferable, i.e., the policy trained on a simple benchmark problem can be directly applied to solve the complex warehouse optimization problem, effectively, without the need for retraining.
Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective
Bossens, David M., Tarapore, Danesh
However, it was widely known that successfully converging to the maximum of that fitness function requires maintaining genetic diversity in the population of solutions (e.g., [1-4]). Moreover, the use of niching demonstrated how maintaining subpopulations could help find multiple solutions to a single problem [5]. Some studies included genetic diversity as one of the objectives of the EA [6]. Approaches in evolutionary robotics, artificial life, and neuro-evolution realised that genetic diversity does not necessarily imply a diversity of solutions, since (i) different genotypes may encode the same behaviour and vice versa; and (ii) many genotypes may encode unsafe or undesirable solutions that should be discarded during evolution (e.g., when a robot crashes into an obstacle). Such approaches began to emphasise behavioural diversity [7-10], not only as a driver for objective-based evolution but also as the enabler for diversity-or novelty-driven evolution [11]. In quality-diversity (QD) algorithms such as MAP-Elites [12] and Novelty Search with Local Competition [13], the behavioural diversity approach is combined with local competition such that the best solution for each local region in the behaviour space is stored, forming a large archive of high-quality solutions. The development of quality-diversity algorithms has allowed a plethora of applications.
On the use of feature-maps and parameter control for improved quality-diversity meta-evolution
Bossens, David M., Tarapore, Danesh
Historically, most evolutionary algorithms (EAs) were designed to optimise a fitness function, solving a single problem without considerations for generalisation to unseen problems or robustness to perturbations to the evaluation environment. However, it was widely known that successfully converging to the maximum of that fitness function requires maintaining genetic diversity in the population of solutions (see e.g., Laumanns et al. (2002); Gupta and Ghafir (2012); Ursem (2002); Ginley et al. (2011)). Moreover, the use of niching demonstrated how maintaining subpopulations could help find multiple solutions to a single problem (Mahfoud, 1995). Some studies included genetic diversity as one of the objectives of the EA (Toffolo and Benini, 2003). Approaches in evolutionary robotics, artificial life, and neuro-evolution realised that genetic diversity does not necessarily imply a diversity of solutions, since (i) different genotypes may encode the same behaviour and vice versa (especially for complex genotypes such as neural networks); and (ii) many genotypes may encode unsafe or undesirable solutions that should be discarded during evolution (e.g., self-collisions on a multi-joint robot arm). Such approaches began to emphasise behavioural diversity (Mouret and Doncieux, 2009b; Gomez, 2009; Mouret and Doncieux, 2009a; Mouret, 2010), not only as a driver for objective-based evolution but also as the enabler for diversity-or novelty-driven evolution (Lehman and Stanley, 2011a). This work is the extended version of the paper: David M. Bossens & Danesh Tarapore (2021). On the use of feature-maps for improved quality-diversity meta-evolution.
Mixed-Initiative Level Design with RL Brush
Delarosa, Omar, Dong, Hang, Ruan, Mindy, Khalifa, Ahmed, Togelius, Julian
Procedurally generated content has been used in games Modern games often rely on procedural content generation since the early 1980s. Early PCG-enabled games like Rogue (PCG) to create large amounts of content autonomously or (Michael Toy, 1980) used PCG to expand the overall depth with limited or no human input. PCG methods are used with of the game by generating dungeons methods as well as coping many different design goals in mind, including enabling a with the hardware limitations of the day (Yannakakis and particular aesthetic. They can also be used to streamline Togelius, 2018). This section will lay out more contemporary time-intensive tasks such as modeling and designing thousands applications and methods of generating game content of unique tree assets for a forest environment.
Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
Khadka, Shauharda, Majumdar, Somdeb, Tumer, Kagan
A key challenge for Multiagent RL (Reinforcement Learning) is the design of agent-specific, local rewards that are aligned with sparse global objectives. In this paper, we introduce MERL (Multiagent Evolutionary RL), a hybrid algorithm that does not require an explicit alignment between local and global objectives. MERL uses fast, policy-gradient based learning for each agent by utilizing their dense local rewards. Concurrently, an evolutionary algorithm is used to recruit agents into a team by directly optimizing the sparser global objective. We explore problems that require coupling (a minimum number of agents required to coordinate for success), where the degree of coupling is not known to the agents. We demonstrate that MERL's integrated approach is more sample-efficient and retains performance better with increasing coupling orders compared to MADDPG, the state-of-the-art policy-gradient algorithm for multiagent coordination.
Collaborative Evolutionary Reinforcement Learning
Khadka, Shauharda, Majumdar, Somdeb, Nassar, Tarek, Dwiel, Zach, Tumer, Evren, Miret, Santiago, Liu, Yinyin, Tumer, Kagan
Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically struggle with achieving effective exploration and are extremely sensitive to the choice of hyperparameters. One reason is that most approaches use a noisy version of their operating policy to explore - thereby limiting the range of exploration. In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space. A collection of learners - typically proven algorithms like TD3 - optimize over varying time-horizons leading to this diverse portfolio. All learners contribute to and use a shared replay buffer to achieve greater sample efficiency. Computational resources are dynamically distributed to favor the best learners as a form of online algorithm selection. Neuroevolution binds this entire process to generate a single emergent learner that exceeds the capabilities of any individual learner. Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient - notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation.
Evolutionary Reinforcement Learning
Khadka, Shauharda, Tumer, Kagan
Deep Reinforcement Learning (DRL) algorithms have been successfully applied to a range of challenging control tasks. However, these methods typically suffer from three core difficulties: temporal credit assignment with sparse rewards, lack of effective exploration, and brittle convergence properties that are extremely sensitive to hyperparameters. Collectively, these challenges severely limit the applicability of these approaches to real world problems. Evolutionary Algorithms (EAs), a class of black box optimization techniques inspired by natural evolution, are well suited to address each of these three challenges. However, EAs typically suffer with high sample complexity and struggle to solve problems that require optimization of a large number of parameters. In this paper, we introduce Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into the EA population periodically to inject gradient information into the EA. ERL inherits EA's ability of temporal credit assignment with a fitness metric, effective exploration with a diverse set of policies, and stability of a population-based approach and complements it with off-policy DRL's ability to leverage gradients for higher sample efficiency and faster learning. Experiments in a range of challenging continuous control benchmark tasks demonstrate that ERL significantly outperforms prior DRL and EA methods, achieving state-of-the-art performances.